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Summary 

We consider inference for the reaction rates in discretely observed networks 
such as those found in models for systems biology, population ecology and 
epidemics. Most such networks are neither slow enough nor small enough for 
inference via the true state-dependent Markov jump process to be feasible. 
Typically, inference is conducted by approximating the dynamics through 
an ordinary differential equation (ODE), or a stochastic differential equation 
(SDE). The former ignores the stochasticity in the true model, and can lead 
to inaccurate inferences. The latter is more accurate but is harder to im- 
plement as the transition density of the SDE model is generally unknown. 
The Linear Noise Approximation (LNA) is a first order Taylor expansion of 
the approximating SDE about a deterministic solution. It can be viewed as 
a compromise between the ODE and SDE models. It is a stochastic model, 
but discrete time transition probabilities for the LNA are available through 
the solution of a series of ordinary differential equations. We describe how 
the LNA can be used to perform inference for a general class of reaction net- 
works; evaluate the accuracy of such an approach; and show how and when 
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this approach is either statistically or computationally more efficient than 
ODE or SDE methods. 



1 Introduction 



Reaction networks are used to model a wide variety of real-world phenom- 
ena; they describe a probabilistic mechanism for the joint evolution of one 
or more populations of species. These species may be biological s pecie s , such 
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such as predators and their prey (e.g lBoys et all 120081 ; iFerm et a/.l . l2008l ). or 



Proctor et all |2005[). animal spec ies, 



interacting groups of humans or animals such as those infected with a partic- 
ular disease , those susceptible to the disease and those who have recovered 



from it (e.g lAndersson and Brittonl . |2000| ; iBall and Neall . l2008t iJewell et al. 
2009( 1 . 



The evolution of these networks is most naturally modelled via a continuous- 
time jump Markov process. The current state of the system is encapsulated 
in a vector giving the numbers of each species that are present. The evolution 
of the state is described by a series of reactions, such as the interaction of 
two copies of a protein producing a dimer of that protein; or an interaction 
between an infected individual and a susceptible individual resulting in the 
susceptible becoming infected. Occurences of a given reaction are modelled 
as a Poisson process, the rate of which depends on the current state of the 
system. Interest lies in inferring the parameters that govern the rate of each 
reaction from data on the evolution of the system. 

This article concerns inference for the rate parameters in discretely observed 
reaction networks allowing that observations may contain noise and may be of 
only a subset of the species in the system. Inference under the jump Markov 
process is possible only for networks which involve few spec ies, few rea c tions, 



and not "too many" transi tions between observations (e.g. iBoys et all 12008 



Amrein and Kunschl . |2012| ) . For most systems it is necessary to approximate 
the evolution of the process to make inference computationally feasible. Of- 
ten this will involve approximatin g the evolution thro ugh a system of ordinary 
differential equations (ODEs, e.g . lJones et all 120101 ) or stochastic differential 
equations (SDEs, e.g. IWilkinsonl . 120061 ). Models based on ODEs are only ap- 
propriate for very large systems f or which the stoch asticity in the evolution 
is small. For medium-size systems IWilkinsonl (120061 ) shows that SDE models 
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are more appropriate and can lead to sensible estimates of the reaction rates. 
However inference for SDE models is non-trivial, as the transition density of 
general SDEs is unknown. 

In this paper we use an alternative approximation, known as the Linear Noise 
Approximation (LNA). The LNA is obtained through first approximating 
the dynamics by a system of ODEs, and then modelling the evolution of the 
state about the deterministic solution of the ODE, through a linear SDE. 
Simulations suggest that this approach has similar accuracy to modelling 
the system directly through SDEs, but it has the important advantage that 
under the LNA the stochastic model for the states is a Gaussian process. This 
means that transition densities are Gaussian, and their mean and covariance 
can be obtained through solving a system of differential equations. Using the 
LNA is thus a more accurate approach than modelling the system via ODEs, 
and it is substantially easier to perform inference than under a general SDE 
model. 

The structure of the paper is as follows. The next section provides more 
information on reaction networks and details three particular examples that 
will be revisited: the Lotka-Volterra predator-prey system, the SIR epidemic 
model, and an autoregulatory gene network. Section [3] examines the different 
possible approximations to the evolution of reaction networks: ODE approx- 
imation; SDEs approximation and the LNA. We give results of an empirical 
study which shows the accuracy of the LNA is often similar to that of the 
SDE. Section H] shows how we can calculate likelihoods under the LNA for 
a range of observation models, and suggests a simple way of embedding this 
within MCMC to perform Bayesian i nference. The use of the L NA for infer- 
ence has been previously suggested by IKomorowski et al\ ( 120091 ). however we 
show that their implementation means that the accuracy of the approxima- 
tion can deteriorate over time and can result in poor parameter estimates. 
We evaluate the use of the LNA approximation empirically on both simulated 
and real data. Results show that our use of the LNA l eads to more accurate 
inference than either that of IKomorowski et al\ ( 120091 ) or methods based on 
an ODE approximation. We als o compare with the SDE-based algorithm of 



Golightly and Wilkinson! (120051 ) . The accuracy of the LNA and SDE-based 



approaches are similar, with the relative performance of the two approaches 
varying depending on which reaction rates are being estimated. The advan- 
tage of the LNA approximation is partly computational, and also that there 
is no need for tuning the level of time-discretisation used in approximating 
the SDE transition density. The article concludes with a discussion. 
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2 Reaction networks 



Consider a general reaction of the form B + C — > D, where the number of 
elements of species B and C are respectively Xb and Xc and where the 
elements are distributed uniformly at random throughout some volume of 
space. The reaction occurs with some fixed probability whenever an element 
of species B is within some "reaction distance" of an element of C; occurences 
of the reaction may therefore be modelled as a Poisson process. With further, 
system dependent, assumptions, such as that elements occupy a fixed volume 
at a constant temperature, the rate, h, of the process is proportional to 
XbXc- Applying a similar argument, the rate of a reaction such as B — > 
C is simply proportional to Xb and the rate of 2B — > C is proportional 
to 0.5Xb (X^ — !) • For a fuller discussion of mass-action kinetics see, for 



example, I Gillespie! ( 120051 ). 

Consider now a network of n r such reactions each involving at least one of the 
n s species in the population. The dynamics of this model can be described by 
a vector of rates of the reactions together with a matrix which describe the 
effect of each reaction on the state. We denote by h the 77, r -vector of reaction 
rates. Now define be the net effect on species j of a single occurence 
of reaction i: so Ay = means that the number of species j is unaffected 
by reaction i, whereas Aij = 1 (or —1) means the number of species j will 
increase (or decrease) by 1. The n r x n s matrix A is known as the net effect 
matrix. An equivalent way of defining the effect of a set of reaction is via 
the stoichiometry matrix, which is the transponse of A. 
Example 1: the Lotka- Volterra model 



The Lotka-Volterra model (ILotkal . Il925l ; IVofterral . Il926l ) describes a popula- 
tion of two competing species: predators which die with rate 9 2 and reproduce 
with rate Q\ by consuming prey which reproduce with rate #3. In its simplest 
form the probabilistic system is defined by the following three reactions. 

.Ri : Pred + Prey — > 2Pred; R 2 : Pred — > <fi; R 3 : Prey — > 2Prey. 

Denoting Pred as X% and Prey as X 2 gives the vector of reactions rates: 

h := (#1X1X2, # 2 Xi, #3X2) . 

The net effect matrix for this example is A, where 

A' = 



-1 
1 
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Here, and throughout, we use ' to denote the transpose of a matrix. So, for 
example, the first column of A' (equivalently the first row of A) shows that 
the first reaction results in an increase of X\ by 1, and a decrease in X2 by 
1. 

Example 2: SIR epidemic model 

A special case of the Lotka-Volterra model is the Susceptible- Infected- Removed 
(SIR) epidemic model, which is as above but without the third reaction (or 
equivalently #3 = 0). The intepretation here is that the the two species are 
susceptible and infected individuals. The reactions relate to an infection of a 
susceptible, and the removal of an infected. Denoting the number of infected 
by X\ and the number of susceptible by X 2 we have 



h := (6 1 X 1 X 2 , 9 2 X X ) , and A' 



-1 




Example 3: prokaryotic autoregulatory gene network 

The following system describes the self-regulating production of a protein, 
P, and its dimer, P 2 . The sys tem is ana l ysed in iGolightly and Wilkinson 



(120051 ) and is also discussed in I Wilkinson! (120061) . while a similar system is 
analysed in IGolightly and Wilkinson! (12008! ). Reactions R\ and R 2 describe 
the reversible process whereby the protein dimer P2 binds to the gene (which 
we denote as DNA) and thereby inhibits the production, by reactions R3 and 
i?4, of the protein, P. Dimerisation of the protein and the reverse reaction are 
described by Reactions R 5 and R 6 , while R7 and R 8 describe the destruction 
of the protein and of the enzyme RNA-polymerase, which is denoted RNA. 



Ri 
R3 
R 5 

Rr 



DNA + P 2 -> DNA • P 2 R 2 

DNA -> DNA + RNA R 4 

2P ->• P 2 R 6 

RNA R 8 



DNA ■ P 2 -> DNA + P 2 . 
RNA RNA + P. 
P 2 2P 
P -> 0. 



From Reactions Ri and R 2 the total amount of DNA and DNA • P2 is fixed 
throughout the evolution of the system; we denote this constant by k and 
henceforth do not directly track the number of molecules of DNA • P2. De- 
noting the number of molecules of DNA, RNA, P, and P 2 as X\, X 2 , X 3 , 
and X 4 respectively gives the following vector of reaction rates: 



h:= \ e x x x x^ e 2 (k-x 1 ), e 3 x 1 , e A x 2 , -e 5 x 3 (x 3 



1), 9 6 X 4 , 8 7 X 2 , 6> 8 X 3 
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The net effect matrix for this example is A, where 



A' = 



-1 


1 
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-1 
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3 Approximations for network evolution 

We first consider the ODE and SDE approximations to the true process, and 
then sketch the justification for the Linear Noise Approximation (LNA), and 
finally provide a comparison between the distributions obtained by simulating 
from the true process, the SDE approximation and the LNA. 
It will be helpful to denote the n^-vector holding the number of molecules of 
each species by X and to define the n r xn r reaction rate matrix H := diag(h). 



3.1 The ODE and SDE approximations 

In an infinitesimal time dt the mean and variance of the change i n X due to 
all of the n r independent Poisson processes can be calculated as (jWilkinsonl . 



2006|): 

E [dX(t)) = A'h dt, Var [dX(t)] = A'HA dt. 

The ODE approximation to the evolution ignores the stochasticity of the 
model and is based on solely on the expected change in the mean. This gives 
the following differential equation 

^ = A'h(X(i),0). 

The SDE approximation models stochasticity through 



dX(t) = A'h(X(i), 0)dt + ^/A'H(X(t), 0)A dW(t) 



where the n s x n s matrix \J A'H(X(i), c) A is any matrix square root, such as 
that obtained by Cholesky or spectral decomposition, and W(t) is Brownian 
motion. 

The ODE model is deterministic, and fitting the model generally involves 
estimating both the initial condition and parameter values that give the best 
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fit to the data. Often the fit to the data is quantified by the sum of the 
square residuals. 

By comparison, parameters of the SDE model can be estimated using likelihood- 
based methods, such as maximum likelihood or calculating a posterior dis- 
tribution. The complication with this is that, generally, the SDE model 
will not lead to a tractable distribution for X(t) given X , and hence these 
models have an intractable likelihood. To overcome this it is common to 
approximate the t ransition density of the S DE, for example by the Eu- 
ler approximation (IKloeden and Platen! Il992l ). The Euler approximation is 
only accurate over small time-intervals. The implementation of these meth- 
ods therefore involves discretising time between each observation, and us- 
ing computationally-intensive methods that impute values of the state at 
both the ob servation times and the grid of times between each observation. 
For example iGolightlv and Wilkins on (20 051) implement such a method with 
an MCMC scheme, and IGolightlv and Wilkinson! (120061 ) within a sequential 
Monte Carlo algorithm. There is a considerable computational overhead in 
implementing these methods which increases with the fineness of the grid 
of time-points between observations, and this has led to much resear ch on 
efficient MCMC and other methods. See iRoberts and Stramerl (120011 ) for a 
discussion of ho w the fineness of the grid can aff ect mixing of the MCMC: 
and, for example, IGolightlv and Wilkinson! (120081 ) for details of more efficient 
MCMC approaches. 



3.2 The Linear Noise Approximation 

The Linear Noise Approximation (LNA) approximates the dynamics of the 
network by another SDE which has tractable transition densities between 
observation times; inference therefore does not require any data augmenta- 
tion. The LNA first app eared as a fu nctio nal cent r al lim it law for density 
dependent processes; see iKurtz (Il970h and iKurtd ril97lh for the technical 
conditions. 



Whilst iKurtzl (119701 ) and iKurtzl (1197lh justify our use of the LNA it will 
be more helpful in the present context to considered the LNA as a general 
approximation to the solution to an SDE, and then derive the LNA as an 
approximation to the SDE model derived in the previous section. The idea 
of the LNA is that we partition X(t) into a deterministic path, rj(t), and 
a stochastic perturbation from this path. Under the assumption that the 
perturbation is "small" relative to the deterministic path the distribution of 
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an approximate solution at any given time point is found by solving a series 
of ODEs. In our applications the deterministic path is just the solution of 
the ODE model introduced in the previous section. Here we provide a short 
heuristic motivation of the approximation; for a more rigorous deriva tion and 



more detailed discussion the reader is referred to lFerm et al\ (120081 ). 



Consider the general SDE for vector X of length n s 

dX(t) = a(X(t)) dt + eS(X(t)) dW(t), (1) 
with initial condition X(0) = X . Let 77 (t) be the (deterministic) solution to 

f = a(„) (2, 

with initial value r) . We assume that over the time interval of interest 
||X - 77H is 0(e). Set M(t) = (X(t) - rj(t)) / e and Taylor expand X(t) about 
r)(t) in p]). Collecting terms of 0(e) gives 

dM(t) = F(t)M(t) dt + S(t) dW(t), (3) 

where F is the n s x n s matrix with components 

da; 



Fij(t) 



dxj 



, and S(t) = S(ry(t)). 

T)(t) 



The use of e in (JTJ) is purely to indicate that the stochastic term eS(X(i)) is 
"small" and to aid in the collection of terms of similar size. Henceforth it 
will be simpler to set e = 1 and assume that S(X(t)) is "small". The initial 
condition for (j3J) is therefore M = (X — r/ ). 

Provided that X has either a point mass at x or has a Gaussian distribution, 
all increments in ([3]) will be a linear combination of Gaussians; thus M(£) 
has a Gaussian distribution for all t. The mean and variance of this Gaussian 
distribution can be obtained by solving a series of ODEs (see Appendix A): 

dm ., 

- = Fm, (4) 
d^f 

— = *F' + F* + SS*, (5) 

(JjL 

where m(t) := E [M(t)], and := Var [M(t)). Note that if m(0) = then 
m(t) = for all t > 0. Therefore initialising 77(0) = x(0) removes the need 
to integrate OH). 
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3.3 Comparison of transition probabilities 



The SDE and the LNA can be viewed as nested approximations to the evolu- 
tion of the reaction network. The accuracy of the transition densities arising 
from these two approximations is now investigated for the autoregulatory 
model (Example 3). Both the LNA and the SDE arise from a continuous ap- 
proximation to a discrete process, with the LNA also assuming that stochastic 
variations are small compared to the magnitude of the process itself, and we 
might therefore expect both approximations to improve as the system size 
increases. Three different system sizes are therefore compared with initial 
conditions x(0) = (50, 80, 80, 80, 50) for O G {1, 10, 100}. 
The rate parameter vector was set to = (O.l/O, 0.7, 0.35, 0.2, O.l/O, 0.9, 0.3, 0.1) 
so as to keep the behaviour of the system (and of th e ODE ([2])) consistent 
with the system examined in lGolightly and Wilkinson! ( 120051 ). which uses the 
above initial state vector and rate vector with 0=1. The true system was 
repeatedly simulated forward and the distribution of species was stored after 
ti = 0.1 time unit, t 2 = 0.5 time unit and £3 = 2.5 time units. The SDE was 
integrated forward using an Euler-Maruyama time step of 0.001 time units. 
Finally the parameters for the multivariate Gaussian transition density for 
t\, t 2 and £3 were estimated by integrating forward the ODEs ([2]) and (J5j). 
Comparison of the transition parameters are shown in Figure [TJ The error 
induced by representing the true system with a continuous approximation 
can be seen for the small and medium system sizes. However, throughout 
both the LNA and SDE models give, by eye, a reasonable approximation to 
the true transition density and there appears to be little difference in the 
accuracy of the two approximations. The main difference is that the LNA 
always gives a Gaussian approximation, whereas the SDE transition density 
can be non-Gaussian. Qualitatively similar results have be en observed for a 
variety of models and parameter values, see iGiagosI ( 12011al ). 



4 Inference using the LNA 

We now describe how we use the LNA to perform inference. We first briefly 
describe inference when we observe a system exactly at a discrete set of 
times. In this case, using the LNA to approximate the likelihood is particu- 
larly simple. We then show how to perform inference under a more general 
model where only a subset of species are observed, and these potentially are 
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Small t = 0.1 Small t = 0.5 Small t = 2.5 




470 480 490 500 510 520 460 480 500 520 540 460 480 500 520 540 560 



Figure 1: Estimated transition probabilities for the small (top row), medium 
(middle row) and large (bottom row) system for t = 0.1 (left column) t = 0.5 
(middle column) and t = 2.5 (right column) using exact simulation (solid black 
line) , the CLA (dashed black line) and the LNA (solid grey line) . For samples with 
fewer than 25 different outcomes vertical bars are used to represent the relative 
probability of each outcome, whereas kernel density estimates are used for samples 
with at least 25 different outcomes. 
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observed with error. We focus on the situation where linear combinations of 
the species are observed with Gaussian error: as in this case the LNA gives 
a tractable approximation to the likelihood. Finally we discuss similarities 
of our approa ch with an alternat i ve me thod of using the LNA for inference 



introduced by iKomorowski et al\ (120091 ) 



4.1 The fully and exactly observed system 

First consider the situation where at each of a discrete set of times, ij (i = 
0, . . . ,n), the system, Xj, is observed completely and without error. Let the 
true transition density of the system be denoted by 7r(xj|xj_i, 9) and the 
LNA approximation of this by klna{ x 4\ x 4-i-i in practice we use the LNA 
approximation obtained using 77 = Xj, m = 0, ^ = 0. 
For a fully observed system, we have that the likelihood factorises as 

n 

L(0)=Ij7T(x i |x i _ 1 ,0). 

i=l 

This motivates using the following approximation 

n 

L LNA {9) = Y\_n LNA (xi\xi^i,9). 



i=l 



This approximation is thus a product of Normal densities, with the mean 
and covariances depending on the parameters. It is possible to maximise 
this likelihood numerically. Alternatively, if we introduce priors for 9, tt{9), 
we can use standard MCMC algorithms to sample from the corresponding 
approximation to the posterior which is proportional to ti(9)Llna{9). 
The accuracy of estimators obt ained by maximising Llna{9) has been exten- 



sively studied in lGiagosi (l2011al ). for both the Lokta-Volterra model (Example 
1), and the auto-regulatory model (Example 3). The method gave reliable 
point estimates of parameters, and reasonable estimates of uncertainty (cov- 
erage of 95% confidence intervals was generally 90% for small systems, and 
close to 95% for large systems). 

4.2 Partially-observed systems 

Now assume that we have partial observations yo, . . . ,y n from times = 
t , . . . ,t n = T, where the conditional distribution for the observations given 
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the true process is 

Y i |x i ~iV(P(6>)x i ,V(0)). 

In our particular context the matrix P(0) simply removes certain compo- 
nents of Xj and leaves the remaining components unchanged; an operation 
that requires no parameterisation. Nonetheless the algorithm is valid for 
more general P(0) such as when the observations are an unknown but fixed 
multiple of the true values (with or without Gaussian error). We also in- 
troduce a prior, X ~ N(fj, , S ). To simplify notation, in the following we 
drop the explicit dependent of the matrices P(0) and V(0) on 0. We will 
also use y 0:i = (y , . . . ,yi). 

Approximating the Likelihood using the LNA 

To calculate the likelihood in this setting we write 

n 

L(0) = 7r(y ) Y[ 7r(y J |yo: J -i)- 
i=i 

We now show how we can use the LNA to obtain approximations to each 
term on the right-hand side. 

Firstly note that 7r(y ) can be calculated directly from our model as 

Y ~iV(P Mo ,PS P* + V), 
and that standard results give X |yo ~ N(nl, Sq), where 

fi* = / x + S P t (PS P t + V)- 1 (y -P / x ) 

£* = So-SqP* (PS P t + V)" 1 PS . 

Now we calculate approximations to vr(yj|y 0: j_i) recursively for % — 1, . . . , n. 
This can be done using Kalman filter recursions and the LNA, and involves 
repeating the following steps: 

1. Obtain the predictive distribution at time t^ 
We will have that for suitable //*_! and 

X 4 _ 1 |y 0rf - 1 ~iV(^_ 1 ,sr_i). 

We then intiatie the LNA with with rj(ti-i) = so that m(^_i) = 
0, and = S*_!. As m^) = 0, we have m(t) = for all t > U. 

Integrating the ODEs (J2J) and (JSJ) forward for time ti — U-x provides 
T](ti) and *f?(ti). Hence our approximation to the transition density is 
Xi|y ls i_i ~ N (fjt it S^, where [i i = rj(ti) and = *(£;). 
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2. Calculate approximation to vr(yj|y 0: (j_i), 9). 
Using Yj = PXj + tj, where ej ~ N(0, V) directly gives 

Y J |y 0;i -i~iV(P Mi ,PS l P t + V). 

3. Calculate approximation to 7r(xj|y 0: ;). 
Since 



Y,; 



yi:(i-l) 



N 





Mi 






( 




i 


_ PSi PS 4 P* + V 



we have directly that Xj|y 1: j ~ N([i*, £*), where 

tA = ^ + S i P*(PS J P* + V)- 1 (y J -P Ail 
£* = Si-S^P* (PS i P* + V) _1 PS i . 



(6) 



(7) 
(8) 



For each z, step (2) of this scheme gives us an approximation to ^(y^yo^x), 
which we denote ttlna (yi|yo : j-i)- We then get our approximation for the 
likelihood of the data as 



L L na(0) = 7r(y \9)Y[^LNA(yi\yoy, 



-1, 1 



(9) 



i=i 



The only approximation in Llna{@) is due to using the LNA for the transition 
density of the system, which gives us the Normal approximation to Xj|y 1: j_! 
from the Normal approximation to Xj_i|yi : j_i. 
MCMC scheme 

It is possible to estimate the parameters by numerically maximising the LNA 
approximation to the likelihood (]9]). However we consider a Bayesian analy- 
sis. We introduce priors for the parameters, ir(0), and use MCMC to generate 
samples from the resulting approximation to the posterior ti(6)L LN a(6)- 
We implemented a random-walk Metropolis algorithm. Each iteration of the 
MCMC algorithm involved a single block update of all the log-parameters. 
Using the log-scale is natural as all parameters are rate or variance parame- 
ters, and hence positive; a log-transform then maps the parameters values to 
the real line. We used pilot runs to tune our algorithm. The variance of the 
random-w alk proposal was tuned to prod uce an acceptance rate in the range 
0.25-0.30 flRoberts and Rosenthal l200lh . Note that using adaptive MCMC 
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(lAndrieu and Thoma . 120081 ) methods is possible, and would remove the need 

for manual tuning. 

Implementation 

The ODEs required for calculating 7r(yj|yo:(i_i)) can be soved numerically. 
Care is needed as in many applications the ODEs are stiff ( IHairer and Wannerl . 



19911 ) . There are standa rd numerical r outines for solving stiff ODEs, and we 
used the lsoda package (jPetzoldl . 119831). Code for imp lementing the LNA for 



simulation is available as an R package (jGiagoa . 1201 lbl ) ; and code for perform- 
ing inference using the LNA is available on request from the corresponding 
author. 



4.3 Alternative use of the LNA 



The use of the LNA for Baye sian inference of stochasti c kinetic networks has 
been previously suggested by iKomorowski et al\ (120091 ) , bu t their implemen- 
tation has important differences from ours. The approach of lKomorowski et al. 
(120091 ) involves using the LNA to obtain an approximation for the joint dis- 
tribution of X 1:n = (Xi, . . . , X n ) conditional on a value for x . This can 
be combined with the linear-Gaussian relationship between each observation 
Yj and state-value Xj to give an approximation to the likelihood for data 
yi :n in terms of the parameters, and the initial value, x . They introduce 
priors for the and x , and sample from the (approximate) posterior for 
these using MCMC. 

In p ractice the most importan ce difference between this approach and ours, is 
that IKomorowski et all ( 120091 ) use the LNA over a time period [0, t n ] obtained 
from solving the ODE approximation to the model over this period for a 
given initial condition. By comparison we use a different LNA for each time- 
interval essentially restarting the LNA using our best estimate of 
x t _i given yo : t-i as the initial condition for the ODE. This difference can be 
important for some models, as the ODE solution can becom e poor over long 
time-periods. Thus the approach of lKomorowski et all ( 120091 ) can give a poor 
approximation to the distribution of X t for larger values of t. By re-starting 
the LNA method over each time-interval we help avoid the problems of the 
approximation getting worse for larger t. 

The difference in accuracy of the two approaches for using the LNA is investi- 
ga ted thoroughly for systems which are fu lly observed at disc r ete tim e-points 



m 



Giagosl (1201 la ): where the method of lKomorowski et al\ (120091 ) was less 



accurate, for both point and interval estimation, than the method we intro- 
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duce above. We further demonstrate the increased accuracy of our approach 
for partially-observed systems in Section 



5 Simulation study 

We now empirically evaluate the performance of inference of using the LNA 
for inference of parameters in both the Lotka-Volterra model (Example 1) 
and the auto- regulatory model (Example 3). Our aim is to both compare our 
approach with inference b ased on ODE approximations, t he LNA approach of 



Komorowski et all (120091 ) . and the SDE-based approach of lGolightly and Wilkinson 
( 20051 ); and to evaluate the accuracy of using the LNA for both point and 



interval estimation. 

In assessing accuracy of a method on a given model, throughout we present 
results based on analysing 100 different data sets from a given model and 
set of parameter values. We present results in terms of estimating the log 
of the parameter values since interest is primarily in the order of magnitude 
of the rates of reactions. As estimates of the parameters we used the pos- 
terior median, as this is both invariant to monotonic transformations of the 
parameters, and is robust to heavy-tailed posterior distributions. 
Comparison with ODE method and Komorowski et al. 
Our first comparison is with both approximating the model using an ODE, 
and with the LNA method of Komorowski et al. Our comparison is based on 
the Lotka-Volterra model. We simulated data with Xo = (40, 140) and with 
6 1 = 0.01, 62 = 0.6 and 63 = 0.3. Observations were made every second for 
30 seconds, or until one of the species went extinct. We used Gamma(2,10) 
priors for all parameters, which gave a reasonable prior mass across the range 
of values the rates take. We assumed that predators are observed exactly, 
and the prey unobserved. 

The ODE method uses a log-likelihood which depends on the sum of squared 
errors between the solution of the ODE and the observations, and is equiv- 
alent to modelling the observations as having additive Gaussian error. Note 
that both LNA methods have a similar computational cost, while the ODE 
approach has a smaller computational cost (as the differential equations for 
the variance in the LNA need not be solved). 

Results are shown in Table [TJ The new LNA approach is uniformly better at 
estimating each parameter in terms of both accuracy of point estimates and 
coverage of credible intervals. Note that the approach of Komorowski et al. 
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Table 1: Comparison of our approach (LNA), the approach of Komorowski 
et al. (KOM) and the ODE approach (ODE). We present results in terms of 
the mean performance across 100 data sets. 







0i 


02 


0s 




logio 


-2.000 


-0.222 


-0.523 


Mean of 


LNA 


-2.001 


-0.248 


-0.541 


posterior 


KOM 


-2.000 


-0.263 


-0.540 


medians 


ODE 


-2.007 


-0.308 


-0.540 


Mean abs. 


LNA 


0.043 


0.056 


0.050 


error of 


KOM 


0.074 


0.086 


0.070 


median 


ODE 


0.080 


0.126 


0.098 


Mean 


LNA 


0.193 


0.188 


0.198 


width of 


KOM 


0.138 


0.153 


0.156 


95% CI 


ODE 


0.141 


0.158 


0.194 


Coverage 


LNA 


93 


84 


88 


of 


KOM 


51 


54 


55 


95% CI 


ODE 


43 


29 


30 
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is in turn more accurate than using an ODE approximation. These results 
are for a small system, and the difference between the different approaches 
will be less if a larger system size is studied. Further results comparing the 
new LNA approach and th e approach of K omorowski et al., for fully observed 
systems, can be found in iGiagosI (j2011al ). Those results also show that the 
new LNA approach is more accurate. 

To see why the new LNA approach is more accurate than either that of 
Komorowski et al. or that of using an ODE approximation, it is useful to 
see results from a single dataset. In Figure |2] we show a simulated data set, 
together with the best fitting ODE solution. This solution gives a poor fit to 
the data, and as the LNA of Komorowski is based on such ODE solutions, 
that approach gives a poor approximation to the likelihood. By comparison, 
as our approach re-starts the LNA at each observation, we are able to get a 
good approximation to the likelihood terms across the whole time-period of 
the data. 

Comparison with SDE approach of Golightly and Wilkinson 

We now compare our method with an approach base d on an SDE approxi- 
mation to the model (jGolightly and Wilkinson! . 120051 ). We compare on the 
auto-regulatory model (Example 3) with observations every half a second for 
25 seconds, and considered two sets of parameter values, detailed in Tables 
[2] and [3] respectively. The second set correspond to all the reaction rates of 
the first set being multiplied by 4. We considered models where all compo- 
nents of the syste m are observed without error, a s the code that implements 
the approach of (jGolightly and Wilkinson! . 120051 ) assumes this observation 
model. We assumed independent half-cauchy priors, 7r(6 , i ) oc 1/(1 + (4#j) 2 ) 
for 9i > for all i. 

The method of lGolightly and Wilkinson! (12005! ) involves imputing the path of 
the SDE at m — 1 time-points in between each observation. We present results 
for m = 10, which gave a good trade-off between accuracy and computational 
efficiency 

As well as comparing the accuracy of point and interval estimates for the 
two methods, we also compare the computational efficiency. To do this we 
calculated the auto-correlation time for estimating each parameter, and from 
this the Effective Sample Size (ESS). We summarise results in terms of the 
ESS divided by CPU time. 

Results for the first set of parameter values is given in Table [2j The com- 
parative performance of the two methods varies with parameter. For the 
parameters #3, (9 4 , #7 and 6*8, both methods are similar in terms of accuracy 
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Figure 2: The true process (solid line) with observed values marked as triangles 
and posterior mean values at observation times marked as circles. The dashed line 
shows the LNA solution for the deterministic process integrated forward from the 
starting values, as used by Komorowski et al.; dotted lines show the LNA solution 
to the deterministic process integrated from the observed or posterior mean value 
at each observation time, as used in our LNA method. The top plot corresponds 
to predators (observed) the bottom plot to prey (unobserved). 
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Table 2: Comparison of our LNA approach with that of the SDE-based 
method of Golightly and Wilkinson (2005). 







di 


o 2 


03 


6> 4 


0b 


0q 


6> 7 


08 




logio 


-1.000 


-0.155 


-0.456 


-0.699 


-1.000 


-0.046 


-0.523 


-1.000 


Mean of 


LNA 


-0.921 


-0.092 


-0.466 


-0.687 


-0.856 


-0.076 


-0.523 


-0.974 


post. med. 


GW05 


-0.975 


-0.135 


-0.445 


-0.643 


-0.972 


0.036 


-0.506 


-0.941 


Mean abs. err. 


LNA 


0.147 


0.139 


0.093 


0.090 


0.197 


0.183 


0.082 


0.107 


of med. 


GW05 


0.107 


0.107 


0.097 


0.103 


0.100 


0.120 


0.086 


0.118 


Mean width 


LNA 


0.730 


0.721 


0.408 


0.410 


1.065 


1.047 


0.410 


0.429 


of 95% CI 


GW05 


0.533 


0.534 


0.441 


0.428 


0.565 


0.600 


0.413 


0.437 


Coverage of 


LNA 


94 


94 


89 


92 


94 


95 


92 


88 


95% CI 


GW05 


95 


93 


91 


86 


97 


97 


91 


86 


Mean 


LNA 


2.28 


2.67 


0.79 


1.04 


1.24 


1.24 


0.97 


0.56 


ESS/sec 


GW05 


0.18 


0.18 


0.60 


1.10 


0.09 


0.08 


0.68 


0.82 



and size and coverage of credible intervals. Coverage of credible intervals 
are consistent with their nominal size. Computational efficiency of the two 
methods is also very similar, with a mean ESS per second of 0.84 and 0.80 
for the LNA and SDE methods respectively. 

However inference of the remaining parameters differs considerably. The SDE 
method is uniformly more accurate, and has substantially smaller credible 
intervales in all cases. Coverage of credible intervals is close to 90% for 
these parameters, but that is similar for both methods. The computational 
efficiency of the LNA is substantially higher for these parameters with the 
ESS per second an order of magnitude greater. 

Note that the parameters for which the inferences of the two methods differ 
consist of two pairs of reaction rates: 6\ and # 2 are the rates of reversible 
reactions linked to the product DNA • P 2 , where as 9$ and 6q are the rates 
of reversible reactions linked to the dimerisation of the protein. As such 
we would expect difficulty in estimating each rate individually, as increasing 
both #i and 9 2 , say, together would lead to similar data. The larger credible 
intervals produced by the LNA are consistent with this. Investigating the 
performance of the SDE approximation shows that the Euler discretisation 
becomes increasingly inaccurate as, say, both 9\ and 82 increase. This leads 
to a poor estimate of the posterior for large values of these rates. 
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Table 3: Comparison of our LNA approach with that of the SDE-based 
method of Golightly and Wilkinson (2005). The true reaction rates are all 
four times larger than for Table 2 







0i 


02 


03 


0a 


5 


06 


07 


08 




logio ® 


-0.398 


0.447 


0.146 


-0.097 


-0.398 


0.556 


0.079 


-0.398 


Mean of 


LNA 


-0.592 


0.229 


-0.099 


-0.324 


-0.596 


0.324 


-0.164 


-0.638 


post. med. 


GW05 


-0.674 


0.160 


0.180 


-0.037 


-0.804 


0.219 


0.090 


-0.345 


Mean abs. err. 


LNA 


0.216 


0.235 


0.257 


0.233 


0.205 


0.236 


0.252 


0.246 


of med. 


GW05 


0.276 


0.287 


0.110 


0.086 


0.407 


0.337 


0.102 


0.091 


Mean width 


LNA 


1.258 


1.263 


0.559 


0.669 


1.359 


1.356 


0.564 


0.716 


of 95% CI 


GW05 


0.500 


0.502 


0.490 


0.350 


0.460 


0.500 


0.458 


0.360 


Coverage 


LNA 


100 


100 


59 


79 


99 


99 


59 


75 


of 95% CI 


GW05 


40 


40 


90 


90 





5 


93 


91 



To investigate this further we considered a second example, where the rates 
of all parameters was increased by a factor of 4. This pushes the valus of 9\, 
02 , 05 and 0q into the upper tail of the prior, and gives insight into whether 
the method of lGolightly and Wilkinson! (120051 ) is producing appropriate cred- 
ible intervals for these parameters, or whether the method is biased towards 
smaller parameter values, which happened to be consistent with the true 
parameter values used for the first simulation study. 

Results are given in Table |3j In this case we observe poor infere nce for Q\ , 
02, 05 and 0q using the method of lGolightly and Wilkinson! ( 120051 ): accuracy 
is lower than using the LNA, and the credible intervals are too small, leading 
to coverage probabilities le s s than 0.5 in all cases. However, the method 
of iGolightly and Wilkinson jiooi) does give more accurate inferences for 
the remaining parameters. This is likely to be because the extra variability 
arising from the larger rates means that the perturbations of the system from 
the ODE solution are no longer small. 
Accuracy of the LNA method 

We further investigate the accuracy of the LNA, by repeating the analysis of 
the auto-regulatory example, but considering different observation models. 
We considered all components observed with error, and only three compo- 
nents observed either exactly or with Gaussian error. We use the same priors 
as above. 
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Table 4: Accuracy of our LNA approach on the auto-regulatory example. 
Results are for all components observed with Gaussian error (4GE) and three 
component observed either exactly (3NE) or with Gaussian error (3GE) 







n 
vi 


n 
V2 


n 
#3 


n 
#4 


n 
Vb 


n 
#6 


n 

U 7 


n 
#8 




logio 9 


-1.000 


-0.155 


-0.456 


-0.699 


-1.000 


-0.046 


-0.523 


-1.000 


Mean of 


4GE 


-0.860 


-0.035 


-0.439 


-0.612 


-0.842 


0.086 


-0.497 


-0.905 


posterior 


3NE 


-0.737 


-0.147 


-0.320 


-0.542 


-0.843 


0.088 


-0.539 


-0.821 


medians 


3GE 


-0.623 


-0.117 


-0.279 


-0.555 


-0.833 


0.093 


-0.541 


-0.833 


Mean abs. 


4GE 


0.211 


0.202 


0.153 


0.195 


0.207 


0.192 


0.143 


0.212 


error of 


3NE 


0.289 


0.157 


0.170 


0.223 


0.206 


0.190 


0.083 


0.229 


median 


3GE 


0.395 


0.142 


0.205 


0.234 


0.217 


0.201 


0.158 


0.239 


Mean 


4GE 


1.044 


1.043 


0.880 


1.079 


1.215 


1.202 


0.808 


1.180 


width of 


3NE 


1.340 


1.187 


0.829 


1.154 


1.217 


1.202 


0.422 


1.128 


95% CI 


3GE 


1.885 


1.793 


1.242 


1.264 


1.325 


1.314 


0.893 


1.221 


Coverage 


4GE 


93 


93 


97 


95 


96 


98 


96 


91 


of 


3NE 


94 


100 


97 


90 


96 


98 


94 


92 


95% CI 


3GE 


100 


100 


96 


96 


97 


99 


97 


95 



Results are given in Table HJ and are comparable with those in Table [2j The 
LNA appears to be giving good inferences of the parameters. As we would 
expect, as we observed fewer species, or observe with error, the uncertainty 
in our estimates increases. Perhaps most importantly, the coverage rates we 
obtain are close to 95% in all cases, suggesting the method is giving a good 
estimate of uncertainty. Note that the we obtain higher coverage rates with 
less informative data, possibly because any bias in the LNA has less effect 
when we have higher posterior variance. 



6 Epidemic Data Analysis 



We now con sider analysing real-data under the SIR model (Example 2). 
Baileyl (119751 ). page 125 provides a complete set of 29 inter- removal times, 
measured in days, from a smallpox outbreak in a community of 120 individ- 
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uals in Nigeria: 



13, 7, 2, 3, 0, 0, 1, 4, 5, 3, 2, 0, 2, 0, 5, 3, 1, 4, 0, 1, 1, 1, 2, 0, 1, 5, 0, 5, 5. 

We can represent this data as the number of removed individuals at the end 
of successive days. We assume an SIR model (Example 2) for the data, with 
the observations being equivalent to daily measurements of X\ + X 2 , as there 
is a fixed population size. For simplicity we assume that a single individual 
remained infected just after the first removal occurred. We also observe the 
population for 10 days after the final removal. We analyse the data assuming 
there is no observation error (qualitatively similar results are obtained if we 
allow for Gaussian observation error). Again we use independent half-Cauchy 
priors, vr(^) oc 1/(1 + (100^) 2 ) for Q { > 0. 

One motivation for analysing this data is th at it is possible to perform in- 



feren ce under the true jump -Markov process ( Fearnh ead and Meligkotsidoul . 



2004 ). While the analysis in lFearnhead and Meligkotsidoul (120041 ) used a dif- 



ferent prior on the parameters and on the initial number of infectives, we have 
re-analysed the data using their exact method for the same initial condition 
and prior as above. 

A comparison of results using the true model and the LNA approximation 
are given in Figure [3j The true posterior medians and 95% credible intervals 
are -3.13, (-3.37,-2.89) for log 10 9 X and -1.13, (-1.38,-0.89) for log 10 # 2 . 
The LNA approximation provides medians and credible intervals of —3.06 
and (—3.16,-2.95) and —1.13, (—1.32,-0.95) for the two parameters re- 
spectively. The LNA gives similar point estimates for both parameters. The 
marginal posterior for 82 is also similar to the truth, but the LNA substan- 
tially underestimates the uncertainty in 9±. 



7 Discussion 

We have demonstrated how the LNA can be used to perform inference for 
reaction networks where all, or a subset, of components are observed. Obser- 
vations can either be exact, or with additive Gaussian error. Results suggest 
that using the LNA is more accurate than approximating the underlying 
model using an ODE. 

The LNA is based upon first obtaining a deterministic approximation to 
the path of the state vector over time; and the modelling the error about 
this deterministic approximation. Key to the error in the LNA being small 
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True 



LNA 





Figure 3: Comparison of true posterior and the LNA approximation to the pos- 
terior for epidemic data. Top row are contour plots of joint posterior for log 10 0\ 
and log 10 02 using the exact model (left) and the LNA (right). Bottom row are 
marginal posteriors for log 10 0\ (left) and log 10 #2 (right): true marginal posterior 
(full-line) and LNA posterior (dashed). 
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is that the deterministic approximation is accurate. We have shown that 
recalculating the deterministic solution between each pair of observations is 
mo re accurate than c a lculat ing a single deterministic solution as suggested 



by iKomorowski et al\ ( 120091 ). Furthermore, across the examples we looked 
at the LNA gives reliable inferences in almost all cases. The one exception 
(for a subset of parameters in the results in Table [3]), corresponded to cases 
where the noise in the model was high, leading to the peturbations of the 
system about the deterministic solution not being small. 
We have also compared with a method based on approximating the under- 
lying model using an SDE. The advantage of the LNA is one of simplicity - 
as the LNA gives an analytic form for the approximation to the transition 
density of the model. In particular there is no need to choose a level of 
time-discretisation. Calculating the LNA involves solving a set of ODEs, but 
standard routines exist for appropriately choosing and adapting the step-size 
using in numerically solving the ODEs. By comparison SDE methods cur- 
rently involve the user pre-specifying the level of time-discretisation. Choos- 
ing an appropriate level is difficult, partly because the required level needed 
to get an accurate approximation can depend on the parameter values, and 
these will change at each MCMC iteration. 

Acknowledgements: we are grateful to Dr. A. Golightly for supplying C 
code for inference on the completel y and exactly observed autoreg ulatory 



example using the methodology in iGolightly and Wilkinson! (120051 ) . Paul 
Fearnhead was supported by EPSRC grant EP/G028745. 
Appendix A 

Define m(t) := E[M(t)], G(t) := E [M(t)M(t)*] and *(t) := Var[M(t)] = 
G(i) — m(t)m(i)*. Then by the linearity of expectation and ([3]) 

dm(t) = E [dM(t)] = E [F(t)M(i) dt) = F(t)m(t) dt. 

dG(t) = E [M(t + dt)M(t + dt) 1 - M(t)M(t)*] = E [m{t)dM(tf + dM(*)M(t)* + dM{t)dM{t) 

= E [M(t)M(t)*F(t)* + F(t)M(i)M(t)* + S(*)S(t)*] dt = G(t)F(t)* + F(t)G(t) + S(*)S(t 

d*(t) = dG{t) - m(t) dm(tf - dm{t) m(t)* = *(t)F(t)* + F(t)*(t) + S(t)S(t)*, 

as required. 
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